Model Strategy β The Ford Estate
Updated: 2026-03-01
Philosophy
Never pay for what you can get free. Never use a sledgehammer when a scalpel works.
The goal: route every request to the cheapest/fastest model that can handle it,
spread usage across providers to avoid hitting limits, and reserve paid models
for tasks that genuinely need them.
Provider Tiers (by cost)
Tier 0 β FREE (use first, always)
| Provider |
Best Models |
Limits |
Best For |
| NVIDIA NIM |
GLM 5, Kimi K2.5, DeepSeek V3.2, Qwen3 Coder 480B |
40 RPM |
Everything β S+ tier models for free |
| Cerebras |
GPT-OSS-120B, Qwen3-235B, GLM-4.7 |
30 RPM, 1M TPD |
Speed-critical tasks (fastest inference) |
| Groq |
Kimi K2, Llama 3.3 70B, Qwen3-32B |
1K RPD, 12K TPM |
Quick lookups, triage |
| OpenRouter (free) |
29 free models |
50 req/day |
Overflow/fallback |
| Codestral |
codestral-latest |
30 RPM, 2K RPD |
Code generation/completion |
| SambaNova |
Llama 3.3 70B, DeepSeek V3 |
Free tier |
Fast inference fallback |
Tier 1 β CHEAP (pennies per task)
| Provider |
Best Models |
Cost |
Best For |
| Gemini Flash |
gemini-2.5-flash |
~$0.075/M input |
Daily driver, coordination |
| Mistral Small |
mistral-small-latest |
Free (4M/month cap) |
European routing, fallback |
| Together AI |
Various open models |
$25 credit |
Batch work, coding |
Tier 2 β MODERATE (use thoughtfully)
| Provider |
Best Models |
Cost |
Best For |
| OpenAI GPT-4o-mini |
gpt-4o-mini |
~$0.15/M input |
When OpenAI quality needed |
| Mistral Medium |
mistral-medium-latest |
Moderate |
Strong all-rounder |
| Gemini Pro |
gemini-2.5-pro |
~$1.25/M input |
Long context, multimodal |
Tier 3 β PREMIUM (last resort for complex tasks)
| Provider |
Best Models |
Cost |
Best For |
| OpenAI GPT-4o |
gpt-4o |
~$2.50/M input |
When you need OpenAI flagship |
| OpenAI o3 |
o3 |
~$10/M input |
Deep reasoning |
| Anthropic Opus |
claude-opus-4-6 |
~$15/M input |
Most complex reasoning, long agentic |
Agent Assignments
Ada (Coordinator) β needs speed + awareness
- Primary: NVIDIA NIM
kimi-k2.5 (free, S+ tier, fast)
- Fallback 1: Cerebras
gpt-oss-120b (free, fastest inference)
- Fallback 2:
google/gemini-2.5-flash (cheap, reliable)
- Complex tasks: Spawn Opus subagents (don't run Opus in main session)
K2 (Tech/Homelab) β needs coding + technical accuracy
- Primary: NVIDIA NIM
deepseek-v3.2 or Codestral (free, top coding)
- Fallback 1: Cerebras
qwen-3-235b (free, strong reasoning)
- Fallback 2:
google/gemini-2.5-flash
- Complex infra: Spawn Opus subagent
Cora (Real Estate) β needs clarity + professionalism
- Primary: NVIDIA NIM
kimi-k2.5 (free, excellent writing)
- Fallback 1:
google/gemini-2.5-flash
- Fallback 2:
mistral/mistral-medium-latest
Winston (Family Butler) β needs warmth + reliability
- Primary:
google/gemini-2.5-flash (cheap, warm tone)
- Fallback 1: NVIDIA NIM
kimi-k2.5
- Fallback 2: Groq
llama-3.3-70b-versatile (fast for reminders)
Synergy (Wife's Assistant) β needs warmth + privacy
- Primary:
google/gemini-2.5-flash
- Fallback 1: NVIDIA NIM
kimi-k2.5
- Fallback 2: Groq
llama-3.3-70b-versatile
Subagents / Cron / Overnight
- Cheap work: Cerebras or Groq (free)
- Coding tasks: Codestral or NVIDIA NIM
qwen3-coder-480b
- Research:
google/gemini-2.5-flash or NVIDIA NIM
- Complex analysis:
anthropic/claude-opus-4-6 (sparingly)
Load Balancing Strategy
Daily Token Budget (rough targets)
- Free providers first: ~80% of daily requests
- Gemini Flash: ~15% (overflow from free tier limits)
- Premium (Opus/GPT-4o/o3): ~5% (complex only)
Limit Avoidance
- Rotate across free providers: NIM β Cerebras β Groq β SambaNova β OpenRouter
- When one hits rate limit, failover to next
- Track daily usage per provider
- Groq has lowest TPM (12K) β use for short exchanges only
- Cerebras 1M TPD is generous β prefer it for longer contexts
- NIM 40 RPM is generous β can be the default
When to Escalate
- Simple Q&A, routing, reminders β Free tier (NIM/Cerebras/Groq)
- Summarization, writing, analysis β Gemini Flash
- Multi-step reasoning, planning β Gemini Pro or spawn Opus subagent
- Code review, complex debugging β Codestral or Opus subagent
- Never run Opus for simple tasks. Ever.
Pending Activations
- DeepSeek: $0 balance (top up for cheapest reasoning: $0.14/M)
- xAI: No credits (buy for Grok 3 access)
- Z.AI: Depleted (top up for GLM-5 direct access)
- DashScope: Models not activated (enable in console)
- HuggingFace: Token valid, endpoint routing needs config